Add maxfailures option to limit test failures before stopping#61560
Add maxfailures option to limit test failures before stopping#61560VanitasCodes wants to merge 4 commits intoJuliaLang:masterfrom
Conversation
|
@oscardssmith Is this along the lines of what you had in mind? |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a maxfailures mechanism to Julia’s Test stdlib to stop a test run after a configurable number of failures/errors, without changing default @testset behavior.
Changes:
- Introduces global atomic failure counter + limit, plus exported setters/getters/reset helpers.
- Adds
MaxFailuresErrorand integrates it with existing failfast propagation/printing paths. - Adds stdlib tests validating default behavior, stopping at N failures, counting errors, and invalid input.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| stdlib/Test/src/Test.jl | Implements global max-failures tracking, exports API, adds MaxFailuresError, and wires it into record and top-level printing. |
| stdlib/Test/test/runtests.jl | Adds integration-style tests that spawn a Julia process to verify max-failures behavior and messaging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result = read(pipeline(ignorestatus(cmd), stderr=devnull), String) | ||
| @test occursin("Max failures reached: 1", result) | ||
| @test occursin("First", result) | ||
| @test !occursin(r"Test Summary:.*\n.*Second", result) |
There was a problem hiding this comment.
These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.
| result = read(pipeline(ignorestatus(cmd), stderr=devnull), String) | ||
| @test occursin("Max failures reached: 2", result) | ||
| @test occursin("First", result) | ||
| @test !occursin(r"Test Summary:.*\n.*Second", result) |
There was a problem hiding this comment.
These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.
| result = read(pipeline(ignorestatus(cmd), stderr=devnull), String) | ||
| @test occursin("Max failures reached: 1", result) | ||
| @test occursin("First", result) | ||
| @test !occursin(r"Test Summary:.*\n.*Second", result) |
There was a problem hiding this comment.
These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.
|
Overall, I think this is the right direction (although seeing the changes in Pkg would be useful to see this in use). |
|
Thanks for putting this together!
Do those functions need to be exported (or public)? |
|
@oscardssmith I have a few questions before I revise. You mentioned the default should be 0 for backward compatibility. I originally went with You're completely right about the atomics. I was overthinking the concurrency case, but tests run sequentially in a single process anyway. I'll switch to a simple For the API, if we make the limit nonatomic and just use direct access like Should I draft a companion PR for Pkg showing how |
|
@DilumAluthge I originally exported them thinking they might be useful for custom test runners, but @oscardssmith pointed out they probably shouldn't be. I think you're right that they should either be |
Keeping them internal (non-public) sounds good to me! |
|
@oscardssmith I've pushed an updated version based on your feedback. The atomics are gone and the globals are now plain @DilumAluthge Updated the PR to remove the exports entirely as you and @oscardssmith suggested. The globals are now accessed directly via |
db67957 to
af210e1
Compare
|
@oscardssmith The count is now a |
af210e1 to
130536b
Compare
|
@oscardssmith Is there anything else you'd like me to address before this is ready? |
|
Seems ready to me. I'd like to see the Pkg side PR before merging to get a real user of this, but the code here looks correct. |
|
@oscardssmith I'll put together a draft PR for the Pkg side showing the integration. I'll link it here once it's ready. |
|
@VanitasCodes Just wanted to check- have you used any AI tools when working on this PR? Either to generate the code or to write the PR description or comments? |
I have had some assistance for the code, yes. But it was majorly used for understanding and navigating through the codebase more effectively. |
|
@oscardssmith quick update before I push. While testing locally I found that when Also while trying to test the Pkg integration I realized Pkg isn't actually tracked in the Julia repo so the |
Yeah, that's right. |
130536b to
2e98e91
Compare
|
@oscardssmith just pushed the LoadError unwrapping fix. Can you have a look at it and lemme know if the changes look right? |
Fixes #21594
Fixes #23375
This came out of the triage discussion on #61483, where @oscardssmith suggested that instead of changing the default behavior of
@testset, we should add amaxfailuresoption that test runners can use to control how many failures are tolerated before stopping execution. This PR implements the Test stdlib side of that.There's a global atomic counter that tracks failures and errors across testsets, and a configurable limit. When the count hits the limit, a
MaxFailuresErroris thrown and execution stops. The default limit istypemax(Int), so existing behavior is completely unchanged unless you explicitly callset_max_failures.Four new functions are exported from Test:
set_max_failures(n)- set the limitget_max_failures()- read the current limitget_failure_count()- read the current failure countreset_failure_count()- reset the counter to zeroThe
MaxFailuresErrorintegrates with the existingis_failfast_errormachinery, so it propagates correctly through nested testsets the same wayFailFastErrordoes. If bothfailfastandmaxfailuresare set,failfasttakes precedence since it's checked first inrecord.The intended usage is for something like
Pkg.test(; maxfailures=10)to callTest.set_max_failures(10)before running tests, which would be a follow-up PR.